Skip to content

[SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode#55935

Closed
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:SPARK-56910-cast-byte-short
Closed

[SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode#55935
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:SPARK-56910-cast-byte-short

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

@gengliangwang gengliangwang commented May 17, 2026

What changes were proposed in this pull request?

Introduce CastUtils.java with nine static helpers for ANSI overflow-checked narrowing to byte / short, and use them from Cast.scala (both codegen and eval paths).

Helpers added:

  • shortToByteExact(short), intToByteExact(int), longToByteExact(long)
  • intToShortExact(int), longToShortExact(long)
  • floatToByteExact(float), doubleToByteExact(double)
  • floatToShortExact(float), doubleToShortExact(double)

ByteExactNumeric / ShortExactNumeric only expose same-type identity narrowing (their toByte(byte) / toShort(short) are trivial), so unlike the int / long targets refactored in #55934 — which delegate to LongExactNumeric.toInt / FloatExactNumeric.toInt / DoubleExactNumeric.toInt / toLong — there is no existing Scala object to route the byte/short narrowing through. The Java helper is the cleanest fit.

Cast.scala changes:

  • castIntegralTypeToIntegralTypeExactCode: the byte / short branch (previously an inline 5-line if/throw block) emits a single CastUtils.${integralPrefix(from)}To${target.capitalize}Exact($c) call. The int branch (added in [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934) is unchanged.
  • castFractionToIntegralTypeCode: the byte / short branch (previously an inline 5-line floor/ceil block plus lowerAndUpperBound) emits a single CastUtils.${fractionalPrefix(from)}To${target.capitalize}Exact($c) call. The int / long branch (added in [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934) is unchanged. The now-unused lowerAndUpperBound Scala helper is removed.
  • Eval paths for castToByte and castToShort add ANSI cases for ShortType / IntegerType / LongType / FloatType / DoubleType source types that delegate to the new helpers, replacing the existing multi-line exactNumeric.toInt(b) + bounds-check body.
  • Two small integralPrefix(from: DataType) / fractionalPrefix(from: DataType) Scala helpers handle the method-name dispatch.

Why are the changes needed?

Part of SPARK-56908 (umbrella). The byte/short narrowing ANSI bodies were 5 lines each across 8 codegen call sites; this PR collapses them to one line per call site, matching the int/long target work merged in #55934.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite"

307/307 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

@gengliangwang
Copy link
Copy Markdown
Member Author

gengliangwang commented May 17, 2026


Stack overview (SPARK-56908 umbrella)

This PR is part of the SPARK-56908 codegen-simplification series. Current status:

Merged:

Open:

@gengliangwang
Copy link
Copy Markdown
Member Author

Audited this PR for the same lessons surfaced by @viirya and @cloud-fan on #55938 (and applied to #55934 / #55939):

  1. Are the helpers redundant with an existing Scala object? No — ByteExactNumeric / ShortExactNumeric only have trivial toByte(byte) / toShort(short) (same-type). Cross-type narrowing with bounds check (e.g. int -> byte throwing castingCauseOverflowError) doesn't exist on any *ExactNumeric object, so the new intToByteExact, longToByteExact, ... helpers are genuinely net-new.
  2. Are the eval-path additions redundant? No — master castToByte / castToShort ANSI bodies are multi-line (call exactNumeric.toInt(b) with a try/catch, then bounds-check the int down to byte/short). The new helpers consolidate those bodies into a one-line call.

So no changes needed on this PR for that review.

Extend `CastUtils.java` with helpers for `byte` and `short` ANSI cast
targets and use them from `Cast.scala`. Drops the byte/short-target
dispatch (and the now-unused `lowerAndUpperBound` Scala helper) added
in SPARK-56909 -- after this PR, all integral and fractional narrowing
ANSI casts share the same `CastUtils.<...>Exact` one-line codegen.

Helpers added:
* `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)`
* `intToShortExact(int)`, `longToShortExact(long)`
* `floatToByteExact(float)`, `doubleToByteExact(double)`
* `floatToShortExact(float)`, `doubleToShortExact(double)`

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` / `castFractionToIntegralTypeCode`
  no longer dispatch on target type -- the helper-name pattern
  `${integralPrefix(from)}To${target.capitalize}Exact` covers all four
  target types.
* Eval paths for `castToByte` and `castToShort` add ANSI cases for
  `ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType`
  source types that delegate to the new helpers; the existing
  `exactNumeric.toInt(b) + bounds-check` fallback now only handles the
  remaining `Decimal` source.

Part of SPARK-56908 (umbrella). The original byte/short ANSI cast bodies
were 5 lines each across 8 call sites; this PR collapses them to one
line per call site, matching the int/long target work from SPARK-56909.

No. The compiled behavior is identical; only the emitted Java source
text changes.

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite \
  *ExpressionClassIdentitySuite"
```

312/312 pass.

Generated-by: Cursor 1.x
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Prior state and problem: Cast.scala had two helpers (castIntegralTypeToIntegralTypeExactCode, castFractionToIntegralTypeCode) that emitted multi-line Java bodies for ANSI byte/short narrowing — an inline if (overflow) throw for integral sources and a 5-line floor/ceil bounds-check block for fractional sources, across 8 codegen call sites. #55934 already collapsed the int/long target case by routing through existing *ExactNumeric Scala objects, but those objects don't expose cross-type narrowing to byte/short (their toByte(byte) / toShort(short) are same-type identities), so the byte/short case still had the inline blocks.

Design approach: Introduce a CastUtils.java helper class holding 9 single-line static narrowing methods. Each method holds source/target DataType references in private static final fields, eliminating per-row references[] lookups vs. the old codegen path. Both byte/short branches in the two codegen helpers, and the corresponding eval-path ANSI cases (castToByte / castToShort), route through the new helpers. A Java class rather than a Scala object avoids the getClass.getCanonicalName.stripSuffix("$") boilerplate that Scala-object access from codegen requires.

Key design decisions:

  • Java helper class rather than extending *ExactNumeric (which would force public API changes on those objects for a narrow internal use case).
  • Static DataType fields rather than passing from/to through codegen references[] (small per-row win).
  • Per-source-type case in eval paths (case IntegerType if ansiEnabled => ...) rather than continuing to flow through the generic case x: NumericType if ansiEnabled block; Decimal still falls through.

Implementation sketch:

  • CastUtils.java: 9 static helpers with single-statement bodies, shared error path via QueryExecutionErrors.castingCauseOverflowError.
  • Cast.scala: byte/short branches emit CastUtils.${prefix}To${Target}Exact($c); two new integralPrefix / fractionalPrefix private helpers map DataType to method-name prefix; the no-longer-used lowerAndUpperBound helper is removed.

Behavior verification: Traced edge cases (NaN, ±Inf, large floats, boundary values 127.5f / -128.5f / 128.0f, integer-but-not-byte values like 200, in-range values) for all source types — old and new paths produce identical results and identical error-message arguments (value, source type, target type). PR description's claim that only emitted Java text changes is verified.

LGTM modulo one optional follow-up below.

// Byte / short narrowing: call the matching CastUtils helper. Existing *ExactNumeric
// objects don't expose cross-type narrowing to byte / short (their toByte / toShort are
// same-type identities), so a Java helper is the cleanest fit.
val castUtils = classOf[CastUtils].getName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (optional, fine to defer to a follow-up): now that this else-branch no longer calls ctx.addReferenceObj("from"/"to", ...), both ctx: CodegenContext and to: DataType are unused in castIntegralTypeToIntegralTypeExactCode — and the same is true for castFractionToIntegralTypeCode just below. Since the SPARK-56908 series is about simplification, it would be consistent to drop both params from both helpers (and from the 5+5 = 10 call sites). Up to you whether to do it here or as a tiny follow-up.

@gengliangwang
Copy link
Copy Markdown
Member Author

@cloud-fan Thanks for the review. Merging to master/4.x

gengliangwang added a commit that referenced this pull request May 27, 2026
### What changes were proposed in this pull request?

Introduce `CastUtils.java` with nine static helpers for ANSI overflow-checked narrowing to `byte` / `short`, and use them from `Cast.scala` (both codegen and eval paths).

Helpers added:
* `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)`
* `intToShortExact(int)`, `longToShortExact(long)`
* `floatToByteExact(float)`, `doubleToByteExact(double)`
* `floatToShortExact(float)`, `doubleToShortExact(double)`

`ByteExactNumeric` / `ShortExactNumeric` only expose same-type identity narrowing (their `toByte(byte)` / `toShort(short)` are trivial), so unlike the `int` / `long` targets refactored in #55934 — which delegate to `LongExactNumeric.toInt` / `FloatExactNumeric.toInt` / `DoubleExactNumeric.toInt` / `toLong` — there is no existing Scala object to route the byte/short narrowing through. The Java helper is the cleanest fit.

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode`: the `byte` / `short` branch (previously an inline 5-line if/throw block) emits a single `CastUtils.${integralPrefix(from)}To${target.capitalize}Exact($c)` call. The `int` branch (added in #55934) is unchanged.
* `castFractionToIntegralTypeCode`: the `byte` / `short` branch (previously an inline 5-line floor/ceil block plus `lowerAndUpperBound`) emits a single `CastUtils.${fractionalPrefix(from)}To${target.capitalize}Exact($c)` call. The `int` / `long` branch (added in #55934) is unchanged. The now-unused `lowerAndUpperBound` Scala helper is removed.
* Eval paths for `castToByte` and `castToShort` add ANSI cases for `ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType` source types that delegate to the new helpers, replacing the existing multi-line `exactNumeric.toInt(b) + bounds-check` body.
* Two small `integralPrefix(from: DataType)` / `fractionalPrefix(from: DataType)` Scala helpers handle the method-name dispatch.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). The byte/short narrowing ANSI bodies were 5 lines each across 8 codegen call sites; this PR collapses them to one line per call site, matching the int/long target work merged in #55934.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite"
```

307/307 pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

Closes #55935 from gengliangwang/SPARK-56910-cast-byte-short.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 6165bb0)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants